Skip to content

Honor workflow-level Codex tuning in workflows#1215

Closed
matzls wants to merge 23 commits intocoleam00:devfrom
matzls:codex/archon-skill-parity
Closed

Honor workflow-level Codex tuning in workflows#1215
matzls wants to merge 23 commits intocoleam00:devfrom
matzls:codex/archon-skill-parity

Conversation

@matzls
Copy link
Copy Markdown

@matzls matzls commented Apr 14, 2026

Summary

  • make workflow-level Codex tuning effective at runtime for normal and loop nodes
  • add regression coverage for Codex override, fallback, and mixed-provider loop preservation
  • align Codex Archon docs and docs-web guidance with the implemented precedence

Validation

  • bun test src/dag-executor.test.ts
  • bun x tsc --noEmit
  • git diff --check

Notes

  • workflow-level precedence is now: workflow YAML -> assistants.codex config -> SDK defaults
  • peer-reviewed with the Codex engine after implementation and after the final regression/doc follow-up

Summary by CodeRabbit

  • New Features

    • Added Codex-tuned assist workflow (archon-assist-codex) for general AI help and exploration
    • Introduced interactive human-in-the-loop Plan-Implement-Validate development workflow with user approval gates
    • Added automatic project detection to identify build tools and validation commands
    • Workflows now adapt to selected AI assistant type (Claude vs Codex)
  • Documentation

    • Comprehensive guides for workflow configuration, monitoring, debugging, and interactive operation
    • CLI reference and assistant architecture documentation
    • Troubleshooting and setup guidance

matzls and others added 23 commits April 11, 2026 11:03
Chat platform adapters (Telegram, Slack, Discord) in @archon/adapters are
pure transport and cannot call messageDb directly. Until now, only the Web
adapter's PersistenceBuffer and the HTTP routes persisted messages, leaving
telegram conversations with rows in remote_agent_conversations but zero rows
in remote_agent_messages. The Web UI then rendered these conversations as
empty.

Add four persistence hooks inside handleMessage, gated strictly on
platform.getPlatformType() === 'telegram' so the web path is completely
untouched:

1. User message persistence after conversation creation + title generation
   but BEFORE the natural-language approval gate, so approval responses are
   captured. !message.startsWith('/') excludes deterministic slash commands.
2. Stream-mode assistant persistence after parseOrchestratorCommands, inside
   the "no retract" branch, so retracted /invoke-workflow text is never saved
   (matches Web's MessagePersistence.retractLastSegment semantics).
3. Batch-mode assistant persistence after platform.sendMessage succeeds, with
   the same retract guard.
4. Top-level catch persistence for error responses, so orphan user rows
   without assistant counterparts can't appear in the Web UI view.

The conversation variable is hoisted out of the inner try block so the
catch handler can reference it. All persistence errors are logged and
swallowed — a DB hiccup must not break the user-facing Telegram reply.

Gate is strict 'telegram' for MVP. Broadening to Slack/Discord/GitHub will
require auditing those adapters' webhook replay behavior first.

Known MVP limitations (will file as follow-ups):
- Tool-call metadata not captured for telegram (web buffer still owns that)
- Workflow dispatch progress messages from dag-executor not captured
- Non-deterministic slash commands also excluded by the coarse startsWith('/')
  gate (acceptable — chat clients don't send ad-hoc slash commands)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Web UI disables the message input for any conversation whose
platform_type is not 'web' because Scope B (bidirectional bridging) isn't
shipped yet. The old disabledReason string — "Continuing chats from other
platforms in the Web UI is coming soon" — was both vague and increasingly
misleading now that Telegram conversations render their full history in
the Web UI.

Replace the hardcoded string with a platform keyed lookup map so each
platform gets a clear "reply from the originating app" hint. The disable
condition itself is unchanged; this is pure copy + a small constant.

Only telegram is functionally wired up (persistence hooks land in a
sibling commit); slack/discord/github entries are forward-compatible and
take effect as soon as persistence is broadened to those platforms.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add a focused test suite for the telegram persistence gate added to
handleMessage. Covers the three load-bearing cases on the user-message
side of the gate:

1. Natural-language telegram messages persist exactly one row with role
   'user', the DB conversation id, the raw message text, and metadata
   { platformType: 'telegram' }.
2. Deterministic slash commands (/help) skip persistence entirely —
   neither user nor assistant rows are created.
3. Web-platform conversations do NOT trigger the centralized path, so
   web's existing MessagePersistence buffer still owns that flow.

Assistant-message persistence hooks (inside handleStreamMode,
handleBatchMode, and the top-level catch) require mocking sendQuery to
yield actual content, which needs a more elaborate mock setup than the
existing test file provides. Tracking that as a follow-up rather than
blocking the MVP on it — the user-persistence path is the primary new
logic and is covered here.

A new mock.module('../db/messages', ...) is added near the existing DB
mocks so that orchestrator-agent.ts's new messageDb import does not try
to open a real DB connection. orchestrator-agent.test.ts runs in its
own bun test invocation per packages/core/package.json, so the new
mock does not pollute sibling orchestrator tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This repo is a clone of coleam00/Archon and will evolve upstream as the
project moves through beta. To keep local customizations persistent
across upstream releases without merge chaos, the working copy is set up
fork-first: origin → matzls/Archon (push access), upstream →
coleam00/Archon (read-only).

Add a "Fork & Upstream Integration" section to CLAUDE.md next to the
existing Git Workflow content so future sessions have a single grounded
reference for:

- Which remote is which and what dev tracks
- Where different customization types belong (personal config vs.
  upstreamable code changes vs. personal code changes)
- The exact commands to integrate upstream releases (fetch + ff merge +
  rebase feature branches)
- The exact commands to contribute back via gh pr create

Intentionally short — this is routing guidance, not a git tutorial.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
9-node DAG PIV loop tuned for Codex behavioral tendencies:
numbered SIGNAL EMISSION CONTRACTs, task-scoped implement loop
(no repo-wide validators mid-task), pre-existing failure tolerance
in code-review, per-file git staging, and tightened COMPLETE signal.

Validated end-to-end on my-second-brain-build (Python/Obsidian vault)
with 32 pre-existing ruff violations — workflow correctly scoped fixes
to branch-introduced issues only.

Also add root-level artifacts/ to .gitignore (workflow runtime output).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reuse the deterministic slash-command allowlist for Telegram user-message persistence so slash-prefixed AI prompts are stored while ephemeral commands still skip persistence.

Add a regression test covering /etc/hosts and stabilize the command-parser mock in the Telegram persistence test block.

Co-authored-by: Codex <noreply@openai.com>
Context: preserve and land the Codex-specific assist workflow onto current dev while keeping the newer telegram persistence behavior already present on dev.

Change:
- add the bundled archon-assist-codex command and workflow defaults plus the tracked Archon skill files they depend on
- default continue and orchestrator assist routing to archon-assist-codex when the assistant type is codex
- extend server, web, docs, and core test coverage for the new workflow and the assistant-aware prompt-builder signatures

Validation:
- bun test packages/cli/src/commands/continue.test.ts
- bun test packages/core/src/orchestrator/prompt-builder.test.ts
- bun test packages/core/src/orchestrator/orchestrator.test.ts
- bun test packages/server/src/routes/api.health.test.ts
- bun test packages/server/src/routes/api.workflows.test.ts
- bun test packages/web/src/lib/workflow-metadata.test.ts
- bun test packages/workflows/src/defaults/bundled-defaults.test.ts
- bun --filter @archon/cli type-check
- bun --filter @archon/core type-check
- bun --filter @archon/server type-check
- bun --filter @archon/workflows type-check
- bun --filter @archon/web type-check
- bun run validate

Codex-Session: 019d80c8-3cb7-79b1-8443-d09a42cb5020
Codex-Rollout: sessions/2026/04/12/rollout-2026-04-12T10-21-39-019d80c8-3cb7-79b1-8443-d09a42cb5020.jsonl
Co-authored-by: Codex <noreply@openai.com>
Extract the detect-project workflow node into a reusable Bun script while preserving its stdout contract. Also tighten the Codex loop prompts so feedback fixes use per-file staging, scoped validation stays tool-accurate, and iteration violations fail without rewriting history.

Extend typed ESLint coverage to .archon/scripts so the new script participates in the existing pre-commit checks.

Co-authored-by: Codex <noreply@openai.com>
Document how assistant selection works across host skills, conversations, workflows, and nodes. Capture fork-specific Codex additions, upstream differences, and the current Codex limitations for workflow nodes.

Co-authored-by: Codex <noreply@openai.com>
Preview upstream sync onto custom dev; auto-merged cleanly and passed bun run validate.

Co-authored-by: Codex <noreply@openai.com>
Document the phase-by-phase session model for archon-piv-loop-codex,
including fresh-context boundaries, interactive loop resume behavior,
and a flow diagram for future review.

Co-authored-by: Codex <noreply@openai.com>
Promote archon-piv-loop-codex into the bundled default workflow set and update the overview docs to advertise it there.

Also fix the execution-notes path references after the workflow moved under .archon/workflows/defaults.

Co-authored-by: Codex <noreply@openai.com>
Exclude CLI-backed workflow runs from the server startup orphan-failure sweep and document the distinction in the workflow authoring guide.

This keeps server restarts from marking active CLI executions failed while they continue in a separate process against the same database.

Co-authored-by: Codex <noreply@openai.com>
Add a reusable reference for repeated Archon log-debugging sessions and surface it from the Codex assist lane and the top-level Archon skill routing table.

The new guide explains the three log layers, run discovery, JSONL filtering, event interpretation, and when to use UI or raw logs.

Co-authored-by: Codex <noreply@openai.com>
- bundle the detect-project helper as a default script and resolve Archon default scripts when repo-local scripts are absent
- stop PIV loop nodes early when git HEAD and task-progress tracking stop advancing
- fail workflow CLI commands early when ~/.archon is not writable and clarify the sandbox failure mode in docs
- persist richer DAG failure metadata for partial-run diagnostics

Co-authored-by: Codex <noreply@openai.com>
- add routing guidance for monitoring, interactive relays, and log debugging in the Archon skill
- add focused references for workflow monitoring cadence, paused-run relay behavior, and JSONL-first debugging
- keep Archon follow-up handling grounded in the run status and per-run logs

Co-authored-by: Codex <noreply@openai.com>
Add the generated PRD for workflow node display names to the Archon repo under docs/prd.

The document keeps one PRD with a small execution-graph-only phase 1 and defers builder, non-graph execution surfaces, inference, and historical-fidelity questions to phase 2.

Co-authored-by: Codex <noreply@openai.com>
Add the fork-level design doc that defines the Codex-first workflow surface, decision rules, and follow-on implementation sequence.

Co-authored-by: Codex <noreply@openai.com>
Refine the Archon Codex skill and assist command so substantial implementation work routes to the Codex PIV lane, and add explicit worktree-proof/readback guardrails for assist-mode edits.

Co-authored-by: Codex <noreply@openai.com>
Add a full Codex-first operator and authoring surface for the Archon skill, including workflow monitoring, debugging, repo init, command authoring, DAG authoring, CLI references, configuration guidance, and a Codex capability crosswalk.

Correct the documented Codex parity boundaries so loop model/provider overrides are described accurately and workflow-level Codex tuning fields are called out as parsed but not runtime-effective per workflow in the current executor.

Validation:
- git diff --check
- archon workflow list --json

Co-authored-by: Codex <noreply@openai.com>
Make modelReasoningEffort, webSearchMode, and additionalDirectories effective from workflow YAML for Codex execution, with config fallback for normal and loop nodes.

Add regression coverage for override, fallback, and mixed-provider loop preservation. Update Codex Archon references to match the implemented precedence.

Co-authored-by: Codex <noreply@openai.com>
Update the assistant architecture reference to reflect that workflow-level Codex tuning fields now override Archon config with config fallback, matching the shipped runtime behavior.

Co-authored-by: Codex <noreply@openai.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 14, 2026

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

This pull request introduces a comprehensive Codex-first workflow infrastructure, including two new bundled Codex-tuned workflows (archon-assist-codex, archon-piv-loop-codex), extensive reference documentation for workflow authoring and operation, support for bundled scripts with Codex execution, loop progress tracking with stuck-loop detection, and dynamic workflow selection based on assistant type in the CLI and orchestrator.

Changes

Cohort / File(s) Summary
Codex Skills & Documentation
.agents/skills/archon/SKILL.md, .agents/skills/archon/agents/openai.yaml
Added Archon skill definition with implicit invocation support and display configuration for Codex routing surface.
Codex Reference Guides
.agents/skills/archon/references/archoring-commands.md, cli-commands.md, codex-capability-crosswalk.md, configuration.md, interactive-workflows.md, log-debugging.md, monitoring.md, repo-init.md, variables.md, workflow-dag.md
Comprehensive reference documentation covering workflow authoring, CLI operations, Codex capability mapping, configuration precedence, interactive loop control, debugging, monitoring, repo initialization, variable substitution, and DAG structure.
Codex Examples
.agents/skills/archon/examples/command-template.md, dag-workflow.yaml
Command and workflow templates demonstrating Load/Execute/Report phases and DAG node patterns for Codex workflows.
Claude Skills
.claude/skills/archon/SKILL.md, .claude/skills/archon/references/log-debugging.md
Updated routing entry and detailed logging/debugging reference for Claude-facing users.
Bundled Codex Workflows & Commands
.archon/workflows/defaults/archon-assist-codex.yaml, archon-piv-loop-codex.yaml, archon-piv-loop-codex.README.md, .archon/commands/defaults/archon-assist-codex.md
New Codex-tuned catch-all assist workflow, full PIV (Plan-Implement-Validate) human-in-the-loop workflow with promise gates and session management, and fallback assist command with structured debugging guidance.
Bundled Script Discovery
.archon/scripts/detect-project.ts, tsconfig.json
New project-type detection script supporting Bun, Node, Python, Go, Rust, and Makefile projects with environment-specific validation/install commands.
Workflow Execution Core
packages/workflows/src/dag-executor.ts, script-discovery.ts, validator.ts, schemas/loop.ts
Enhanced DAG executor with: Codex tuning options propagation, bundled script resolution/execution, loop progress tracking with stuck-iteration detection, script error metadata tracking. Script discovery now merges repo and bundled defaults; loop schema adds progress_file and stuck_after_no_progress_iterations.
Bundled Defaults Registry
packages/workflows/src/defaults/bundled-defaults.ts, bundled-defaults.test.ts, script-discovery.test.ts, loader.test.ts, validator.test.ts, dag-executor.test.ts
Registered new Codex workflows/commands and detect-project script in bundled assets; extensive test coverage for script bundling, loop stuck detection, and provider-level Codex tuning.
CLI Assistant Selection
packages/cli/src/commands/continue.ts, continue.test.ts
Dynamic workflow selection: continueCommand now chooses archon-assist-codex for Codex assistants and archon-assist for Claude, with fallback to config settings.
CLI & Workflow Foundation
packages/cli/src/commands/workflow.ts, workflow.test.ts, cli.ts, package.json
Added pre-flight SQLite write-access guard for state-mutating workflow commands; updated help text and test references for new continue tests.
Database & Metadata
packages/core/src/db/workflows.ts, workflows.test.ts
Updated failWorkflowRun to accept optional metadata object (for node_counts/failed_nodes) and failOrphanedRuns to exclude CLI-owned runs from auto-failure on restart.
Orchestrator & Prompt Building
packages/core/src/orchestrator/orchestrator-agent.ts, orchestrator-agent.test.ts, orchestrator.test.ts, prompt-builder.ts, prompt-builder.test.ts
Added Telegram user-message persistence, assistant-type threading through prompt builders, new getAssistWorkflowName() helper selecting Codex/Claude assist workflow, and routing-rule interpolation with correct workflow names.
Path Utilities
packages/paths/src/archon-paths.ts, index.ts
New getDefaultScriptsPath() export for accessing bundled scripts directory.
Web API & Documentation
packages/server/src/routes/api.health.test.ts, api.workflows.test.ts, packages/web/src/components/chat/ChatInterface.tsx, lib/workflow-metadata.test.ts
API mocks updated to include archon-assist-codex bundled defaults; platform-specific reply hints for non-web conversations; workflow display name/category parsing for -codex suffix.
Public Docs
packages/docs-web/src/content/docs/book/essential-workflows.md, first-five-minutes.md, quick-reference.md, getting-started/overview.md, guides/authoring-workflows.md, guides/index.md, guides/loop-nodes.md, reference/assistant-architecture.md, reference/cli.md, reference/index.md, reference/troubleshooting.md
Comprehensive documentation additions covering Codex workflow variants, loop progress/stuck settings, assistant architecture and provider selection, CLI name resolution with -codex suffix, and SQLite write-access requirements.
Design & Policy
docs/design/codex-first-workflow-surface-strategy.md, docs/prd/workflow-node-display-names.prd.md, CLAUDE.md
Codex-first design philosophy doc establishing curated asset policy and capability crosswalk; node display-name PRD; fork upstream integration guidance.
Build Configuration
.gitignore, eslint.config.mjs
Updated ignore patterns to allow .agents/skills/** and artifacts/ directory; extended ESLint type-checking to .archon/scripts/**.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

🐰 Scripts hop from bundled defaults, workflows dance in Codex lanes,
Loop promises gate the progress, stuck detection stops the pain.
From assist to piv to debug—each surface shines and true,
This fork finds its footing, Codex-first through and through! 🎯

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 59.68% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive The description is incomplete relative to the template, missing critical sections like UX journey, architecture diagrams, and required validation/risk/compatibility details. Complete the description by adding UX journey flows, architecture diagrams, module connections, label snapshot, validation evidence, security/compatibility/side-effect/rollback analysis, and verified scenarios beyond CI.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: implementing runtime support for workflow-level Codex tuning configuration, which is the primary objective of this PR.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
⚔️ Resolve merge conflicts
  • Resolve merge conflict in branch codex/archon-skill-parity

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@Wirasm
Copy link
Copy Markdown
Collaborator

Wirasm commented Apr 14, 2026

Thank you for contributing, but this PR is way to large to review, please reopen smaller pieces of work

@Wirasm Wirasm closed this Apr 14, 2026
@matzls matzls deleted the codex/archon-skill-parity branch April 14, 2026 13:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants